Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling
نویسندگان
چکیده
This paper proposes an automatic prosodic labeling technique for constructing speech database used for speech synthesis. In the corpus-based Japanese speech synthesis, it is essential to use annotated speech data with prosodic information such as phrase boundaries and accent types. However, manual annotation is generally time-consuming and expensive. To overcome this problem, we propose an estimation technique of accent types and phrase boundaries from speech waveform and its transcribed text using both language and acoustic models. We use conditional random field (CRF) for the language model, and HMM for the acoustic model which has shown to be effective in prosody modeling in speech synthesis. By introducing HMM, continuously changing features of F0 contours are modeled well and this results in higher estimation accuracy than conventional techniques that use simple polygonal line approximation of F0 contours.
منابع مشابه
Automatic prosodic labeling of accent information for Japanese spoken sentences
This paper describes a method of automatic labeling of prosodic information focusing on accent types and accent phrase boundaries for Japanese spoken sentences. They are predicted by CRF (Conditional Random Fields) using linguistic information and F0 contour information. In the prediction of the accent type, we propose a method that uses a provisional accent type predicted by linguistic informa...
متن کاملCombining acoustic, lexical, and syntactic evidence for automatic unsupervised prosody labeling
Automatic labeling of prosodic events in speech has potentially significant implications for spoken language processing applications, and has received much attention over the years, especially after the introduction of annotation standards such as ToBI. Current labeling techniques are based on supervised learning, relying on the availability of a corpus that is annotated with the prosodic label...
متن کاملAn Intonational Phrase Boundary and Pitch Accent Dependent Speech Recognizer
Does prosody help word recognition? In this paper, we propose a novel probabilistic framework in which word and phoneme are dependent on prosody in a way that improves word recognition. We describe the idea of prosody dependent speech recognition by building a prosody dependent speech recognizer that conditions word and phoneme models on two important prosodic variables: intonational phrase bou...
متن کاملExploiting Acoustic and Syntactic Features for Prosody Labeling in a Maximum Entropy Framework
In this paper we describe an automatic prosody labeling framework that exploits both language and speech information. We model the syntactic-prosodic information with a maximum entropy model that achieves an accuracy of 85.2% and 91.5% for pitch accent and boundary tone labeling on the Boston University Radio News corpus. We model the acousticprosodic stream with two different models, one a max...
متن کاملAnalysis of Inconsistencies in Cross-Lingual Automatic ToBI Tonal Accent Labeling
This paper presents an experimental study on how corpus-based automatic prosodic information labeling can be transferred from a source language to a different target language. Tone accent identification models trained for Spanish, using the ESMA corpus, are used to automatically assign tonal accent ToBI labels on the (English) Boston Radio news corpus, and vice versa. Using just local raw proso...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014